Skip to content

NO-JIRA: certregenerationcontroller: Improve goroutine mgmt#1981

Open
tchap wants to merge 2 commits intoopenshift:mainfrom
tchap:certregeneration-refactoring
Open

NO-JIRA: certregenerationcontroller: Improve goroutine mgmt#1981
tchap wants to merge 2 commits intoopenshift:mainfrom
tchap:certregeneration-refactoring

Conversation

@tchap
Copy link
Copy Markdown
Contributor

@tchap tchap commented Dec 10, 2025

Also use typed queues, the untyped are deprecated.

This is just refactoring and handling some todos.
I took the same approach as implemented in core k8s controllers.

Summary by CodeRabbit

  • Refactor
    • Modernized controller infrastructure to a shared factory-based controller for simpler reconciliation and informer wiring.
    • Replaced ad‑hoc queue/workers with standardized lifecycle handling, removing custom run/worker/queue management.
    • Replaced unmanaged goroutines with tracked worker lifecycle and coordinated shutdown to improve startup/shutdown reliability.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Dec 10, 2025
@openshift-ci-robot
Copy link
Copy Markdown

@tchap: This pull request explicitly references no jira issue.

Details

In response to this:

Also use typed queues, the untyped are deprecated.

This is just refactoring and handling some todos.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Dec 10, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Replaces a bespoke single-item workqueue CA bundle controller with a library-go factory controller (unexported controller type, changed sync signature, removed Run), and updates command lifecycle to use a cancellable context plus sync.WaitGroup and tracked goroutines.

Changes

Cohort / File(s) Summary
CA Bundle Controller (controller refactor)
pkg/cmd/certregenerationcontroller/cabundlesyncer.go
Replaced custom workqueue and Run/worker lifecycle with factory.New().WithInformers(...).WithSync(...).ToController(...); renamed exported CABundleController → unexported caBundleController; updated sync signature to func (c *caBundleController) sync(ctx context.Context, _ factory.SyncContext) error; NewCABundleController now returns a factory.Controller (error removed); removed explicit cache sync, queue handling, and event-handler enqueueing.
Command lifecycle (goroutine coordination)
pkg/cmd/certregenerationcontroller/cmd.go
Introduced context.WithCancel and sync.WaitGroup; replaced raw go launches with wg.Go(...) wrappers; removed unused error from controller constructor call; run controllers with explicit worker counts (e.g., Run(ctx, 1)); deferred cancel() and wg.Wait() to coordinate shutdown; informer startup uses the derived cancellable context.

Sequence Diagram(s)

sequenceDiagram
    participant Informer as Informer(s)
    participant Factory as factory.Controller
    participant Sync as caBundleController.sync
    participant KubeAPI as Kubernetes API

    Informer->>Factory: inform event (ConfigMap change)
    Factory->>Sync: invoke sync(ctx, factory.SyncContext)
    Sync->>KubeAPI: read/update ConfigMaps/Secrets
    KubeAPI-->>Sync: response
    Sync-->>Factory: sync complete
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 11 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (11 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main objective of the PR: improving goroutine management by refactoring to use typed controllers and sync.WaitGroup instead of untracked goroutines.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed Pull request modifies only implementation files without adding or modifying test files, so test name evaluation is not applicable.
Test Structure And Quality ✅ Passed PR modifies only implementation files (cabundlesyncer.go and cmd.go) with no changes to test files or test structure.
Microshift Test Compatibility ✅ Passed PR modifies only controller implementation files (cabundlesyncer.go, cmd.go), not e2e test files.
Single Node Openshift (Sno) Test Compatibility ✅ Passed The pull request modifies controller implementation files (cabundlesyncer.go, cmd.go) in pkg/cmd/certregenerationcontroller/ and adds no Ginkgo e2e test files. The modified files are non-test code containing controller logic.
Topology-Aware Scheduling Compatibility ✅ Passed Pull request contains only controller implementation refactoring with no scheduling constraints, affinity rules, topology constraints, or replica logic changes.
Ote Binary Stdout Contract ✅ Passed klog calls occur in runtime controller methods, not process-level code, complying with OTE Binary Stdout Contract.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No Ginkgo e2e test patterns found in the modified production code files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci Bot requested review from deads2k and dinhxuanvu December 10, 2025 11:23
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
pkg/cmd/certregenerationcontroller/cmd.go (1)

127-127: Minor: Duplicate configInformers.Start() call.

configInformers.Start(ctx.Done()) is called at line 127 and again at line 168. While SharedInformerFactory.Start() is idempotent (safe to call multiple times), the first call could be removed since line 168 starts all informers after resources are set up.

-	configInformers.Start(ctx.Done())
-
 	wg.Add(1)
 	go func() {
 		defer wg.Done()
 		featureGateAccessor.Run(ctx)
 	}()

Also applies to: 168-168

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 2c58d2f and d93c607.

📒 Files selected for processing (2)
  • pkg/cmd/certregenerationcontroller/cabundlesyncer.go (4 hunks)
  • pkg/cmd/certregenerationcontroller/cmd.go (3 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**

⚙️ CodeRabbit configuration file

-Focus on major issues impacting performance, readability, maintainability and security. Avoid nitpicks and avoid verbosity.

Files:

  • pkg/cmd/certregenerationcontroller/cabundlesyncer.go
  • pkg/cmd/certregenerationcontroller/cmd.go
🔇 Additional comments (4)
pkg/cmd/certregenerationcontroller/cabundlesyncer.go (2)

40-40: LGTM! Correct migration to typed queue.

The refactor from the deprecated RateLimitingInterface to TypedRateLimitingInterface[string] is properly implemented with the corresponding constructor and config changes.

Also applies to: 52-57


82-105: LGTM! Well-structured goroutine lifecycle management.

The shutdown sequence is correct:

  1. c.queue.ShutDown() signals the worker to exit (causes Get() to return quit=true)
  2. wg.Wait() ensures the worker completes before returning

The HandleCrashWithContext(ctx) is a good improvement for context-aware crash handling.

pkg/cmd/certregenerationcontroller/cmd.go (2)

121-133: LGTM! Correct defer ordering for graceful shutdown.

The pattern is correct: defer cancel() after defer wg.Wait() ensures cancel executes first (LIFO), signaling goroutines to stop before waiting. The comment at line 123 accurately documents this requirement.


170-180: LGTM! Proper WaitGroup tracking for controller goroutines.

Both controller goroutines are correctly managed with wg.Add(1) before launch and defer wg.Done() inside, ensuring graceful shutdown coordination.

@tchap tchap force-pushed the certregeneration-refactoring branch from d93c607 to c619281 Compare January 21, 2026 18:05
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jan 21, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: tchap
Once this PR has been reviewed and has the lgtm label, please assign sanchezl for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tchap
Copy link
Copy Markdown
Contributor Author

tchap commented Jan 21, 2026

/retest

@@ -74,23 +80,24 @@ func NewCABundleController(
}

func (c *CABundleController) Run(ctx context.Context) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could this be rewritten to use standard factory.New() controller ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it could, I will take a look when I have more spare time.

Copy link
Copy Markdown
Contributor Author

@tchap tchap Apr 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewritten to use factory.New.

func (c *CABundleController) Run(ctx context.Context) {
defer utilruntime.HandleCrash()

// FIXME: These are missing a wait group to track goroutines and handle graceful termination
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does graceful termination mean in this context ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much nothing, I don't think the command is handling signals really.

Always nice to handle signals, but not really important here, I guess. It's a bit orthogonal to handling goroutines. It's always good to have all goroutines terminated when Run returns.

@tchap tchap force-pushed the certregeneration-refactoring branch 2 times, most recently from d9eb289 to c47d6d6 Compare April 22, 2026 12:16
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/cmd/certregenerationcontroller/cmd.go (1)

127-127: Duplicate configInformers.Start() calls.

configInformers.Start(ctx.Done()) is called at line 127 and again at line 167. While SharedInformerFactory.Start() is idempotent (it safely no-ops for already-started informers), having it called twice may confuse future readers about the intended lifecycle.

Consider removing the first call at line 127 if the feature gate informers can wait until line 167, or add a brief inline comment explaining why the early start is required for featureGateAccessor to initialize.

Also applies to: 167-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cmd/certregenerationcontroller/cmd.go` at line 127, Duplicate call to
configInformers.Start(ctx.Done()) (SharedInformerFactory.Start) appears at the
earlier location and again later near where featureGateAccessor is used; remove
the first call so the factory is only started once at the intended lifecycle
point (the later start near featureGateAccessor) OR, if the early start is
actually required to initialize featureGateAccessor, keep the early
configInformers.Start call and add a brief inline comment next to that call
referencing featureGateAccessor initialization (e.g., "start early to allow
featureGateAccessor to initialize") so future readers understand the rationale.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/cmd/certregenerationcontroller/cmd.go`:
- Line 127: Duplicate call to configInformers.Start(ctx.Done())
(SharedInformerFactory.Start) appears at the earlier location and again later
near where featureGateAccessor is used; remove the first call so the factory is
only started once at the intended lifecycle point (the later start near
featureGateAccessor) OR, if the early start is actually required to initialize
featureGateAccessor, keep the early configInformers.Start call and add a brief
inline comment next to that call referencing featureGateAccessor initialization
(e.g., "start early to allow featureGateAccessor to initialize") so future
readers understand the rationale.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e4b1f0a0-2060-497a-b330-d1e514dcd7aa

📥 Commits

Reviewing files that changed from the base of the PR and between c619281 and d9eb289.

📒 Files selected for processing (2)
  • pkg/cmd/certregenerationcontroller/cabundlesyncer.go
  • pkg/cmd/certregenerationcontroller/cmd.go

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/cmd/certregenerationcontroller/cmd.go (1)

127-127: Duplicate configInformers.Start() call is safe but redundant.

configInformers.Start(ctx.Done()) is called at both line 127 and line 167. While SharedInformerFactory.Start() is idempotent (subsequent calls are no-ops for already-started informers), the duplicate call adds confusion. Consider removing line 167 since line 127 already starts the config informers before the feature gate accessor needs them.

🔧 Suggested fix
 	// We can't start informers until after the resources have been requested. Now is the time.
 	kubeAPIServerInformersForNamespaces.Start(ctx.Done())
 	dynamicInformers.Start(ctx.Done())
-	configInformers.Start(ctx.Done())

Also applies to: 167-167

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/cmd/certregenerationcontroller/cmd.go` at line 127, The call to
configInformers.Start(ctx.Done()) is duplicated; remove the second invocation
(the later call near where feature gate accessor is initialized) so only the
initial configInformers.Start(ctx.Done()) at the earlier setup remains; update
any nearby comments if needed to reflect a single start and ensure no other code
relies on the redundant call in functions/methods around cmd.go where
configInformers.Start is referenced.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/cmd/certregenerationcontroller/cmd.go`:
- Line 127: The call to configInformers.Start(ctx.Done()) is duplicated; remove
the second invocation (the later call near where feature gate accessor is
initialized) so only the initial configInformers.Start(ctx.Done()) at the
earlier setup remains; update any nearby comments if needed to reflect a single
start and ensure no other code relies on the redundant call in functions/methods
around cmd.go where configInformers.Start is referenced.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 14c47469-79f7-4a02-9e99-9315e7cd88ce

📥 Commits

Reviewing files that changed from the base of the PR and between d9eb289 and c47d6d6.

📒 Files selected for processing (2)
  • pkg/cmd/certregenerationcontroller/cabundlesyncer.go
  • pkg/cmd/certregenerationcontroller/cmd.go

@tchap tchap force-pushed the certregeneration-refactoring branch 2 times, most recently from 9824745 to d08e211 Compare April 22, 2026 13:10
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/cmd/certregenerationcontroller/cmd.go`:
- Around line 121-130: The startup currently blocks waiting for feature gates
without honoring context cancellation, so modify the bootstrap that waits for
InitialFeatureGatesObserved to also select on ctx.Done() (the cancellable
context created by ctx, cancel) so shutdown unblocks immediately; specifically,
update the logic around featureGateAccessor.Run(ctx) and the subsequent wait for
InitialFeatureGatesObserved() to use a select that listens for either
featureGateAccessor.InitialFeatureGatesObserved() completion (or its returned
channel/boolean) or ctx.Done(), and return/cleanup when ctx is cancelled instead
of waiting the full timeout.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: db861c3c-cc07-4515-8bc7-804b3f1ae177

📥 Commits

Reviewing files that changed from the base of the PR and between c47d6d6 and 9824745.

📒 Files selected for processing (2)
  • pkg/cmd/certregenerationcontroller/cabundlesyncer.go
  • pkg/cmd/certregenerationcontroller/cmd.go

Comment thread pkg/cmd/certregenerationcontroller/cmd.go
tchap added 2 commits April 22, 2026 17:04
Use a WaitGroup to manage all goroutines.
Rewrite the controller init to use factory.New to remove unnecessary
boilerplate code.
@tchap tchap force-pushed the certregeneration-refactoring branch from d08e211 to e6bd2b1 Compare April 22, 2026 15:04
@tchap
Copy link
Copy Markdown
Contributor Author

tchap commented Apr 23, 2026

/retest

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 23, 2026

@tchap: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants